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ABSTRACT ■ / 

What is meant by "accountability" varies a cfreat 
deal. It is ntft, however, the tools such as merit salary plains, 
voucher plans, and manage ment^?fefechniques that are used to achieve 
accountability. Accountability has from its earliest days been tied 
to testiftg. In discussing testing, it is necessary to*~di$cuss the 
pros and cons of standardized, or norm-referenced, tests \and of 
criterion-referenced tests; to consider the numerous against testing 
in general; and to examine the suggestions, for alternatives to the 
usual methods of assessing student achievement. An administrator 
faced with the decision of what methods of evaluation to Use for 
accountability .will find that there are no easy answers. Most 
authorities on testing seem to agree that traditional standardized 
testing is not adequate. Yet there is still a great deal 6f 
disagreement . about which other methods can do the job best. It seems 
clear that, for the time being at least, all the best methods of 
assessment and evaluation are going to involve a great deal of time 
and money. TKe method of evaluation chosen depends on one 1 ^ 
definition of accountability, which in turn depends on one f te idea of 
what good 'education is. (Author/IRT) \ 
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FOREWORD 



Both the National Association of Elementary School Prin- 
cipals and the ERIC Clearinghouse on Educational Manage- 
ment arc pleased to continue the School Leadership Digest, 
with a second scries of reports designed to offer school leaders 
essential information on a wide range of critical concerns in 
education. 

The School Leadership Digest is a scries of monthly reports 
on top priority issues in education. At a time when decisions 
in education must be made on the basis of increasingly com- 
plex information, the Digest provides school administrators 
with concise/readable analyses of the most important trends 
in schools today, as well as points up the practical impWca- 
tions of major research findings. 

By special cooperative arrangement, the scries draws on 
the extensive research facilities and expertise of the ERIC 
Clearinghouse on Educational Management. The titles in tht 
scries were planned and developed cooperatively by both 
organizations. Utilizing the resources of the ERIC network, 
the Clearinghouse is responsible for researching the topics 
and preparing tjjc copy for publication by NAESP. 

The author of this report, Jo Ann Mazzarclla, is employed 
by the Clearinghouse as a research analyst and writer. 



Paul L. Houts 
Director of Publications 
NAESP 



Ctuart C Smith 

Assistant Director and Editor 

ER1C/CEM 



INTRODUCTION: THE PROBLEM OF DEFINITIONS 



The task of implementing accountability programs can fill 
administrators with high cnthusiasnvor deep despair-enthu- 
siasm when accountability seems to promise a truly effective 
way to improve the cducatiori in their schools; despair when 
accountability sounds like mere empty idealism, impossible 
to implement. 

One step toward changing accountability from an ideal to 
a reality is choosing some method of determining whether 
educational goals have been reached. This usually means 
choosing methods to measure student performance. Which 
methods of assessment arc best? The answer to this question 
depends to some extent on the meaning of accountability. 

The *erm was first used in regard to education in 19(59 
when Leon Lcssingcr, as Associate Commissioner of Kduca- 
lion, came up with an idea that seemed as reasonable as it 
was novel that grant seekers should specify prec isely the in- 
tended educational outcomes and costs of their projects. In 
addition, those receiving grants were to be audited to see 
whether they had indeed achieved these outcomes within the 
specified costs. 

This rather limited concept expanded to become much 
broader in meaning, as is evidenced in this definition by Lcs- 
singer, Pamclt, and Kaufman: 

Accountability in education means just what its dictionary 
definition says it means: responsibility. U >ou arc held ac- 
countable for something, you arc responsible for it, answerable 
to someone about it. In education, accountability nicanf that 
educators of all kinds should be answerable to parents/or how 
effectively their children arc being taught alid answerable to 
taxpayers for how usefully their money is being spent. 
Accountability caught on immediately in America and has 
had enormous influence on Americ an educational theory. De- 
signs for programs can now be found in all subject areas from 
foreign language to vocational education, in kindergarten 

■ i 
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through high school— and there arc those who predict that 
Accountability will someday be a part/ of all learning and 
teaching that goes on in America's schools. 
[ Lcssingcr has estimated that since the appearance of his 
first article on accountability in 1969 at least 4,000 refer- 
ences dealing with accountability have been published. Since 
everybody is talking about accountability, it wolulcj appear 
that everybody is talking about the same thing, but this as- 
sumption couldn't be further from the truth. The term has a m 
myriad of meanings, depending on who is using it. 

The definition formulated by Lcssingcr, Parncll, and Kauf- 
man is broad enough to include what most people mean by 
accountability, but many other more specific definitions have 
been formulated. The core of most of these definitions is 
how they answer the following questions: Who is*accounta- 
blc? Accountable to whom? Accountable for what? The table 
indicates some of the answers that have been offered. 

Who is accountable? Xgo whom? For what? 

Teachers Children and parent! Specifying costs (both 

past and future) 

Wise spending 

Specifying educational 
goals 

Achievement of goals 

Students' acquisition 
of basic skills 

Reporting to the public 

Creating a suitable edu- 
cational environment 

Behaving professionally 

Educational input or 
process 

Helping to create 
intelligent citizens 



Principals 
Schools 

Superintendents 

School boards 

Local school systems 

State departments 
of education 



The teaching 
profession 

The school board 

State departments 
of education 

State or federal 
legislators 



Paid contractors 
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To have d complete view of the mora if of meanings that 
surrounds accountability (and to be able to bcahyo extricate 
ourselves f|om that morass), we must c^amjtic yet another 
way of loJking at the concept. Many scWto sec accounta- 
bility as synonymous with the methods 0mploy c d to achieve 
it. For^rfamplc\ in the past when manyjcdulators spoke of 
acpotfntaBility they meant performance contracting. Other 
"writers ajid educators may actually bc/rcjscrring to things like 
merit saliiry programs, Jcncks' vouchfcr plan, or systems man- 
agement! techniques like PPBES. lOur first step out of the 
morass Is to remember that thes/ systems arc mOrcly meth- 
ods; fhiy do not define accour^ability bjVit arc, afe Lcssingcr 
and.his/associatcs pointed out ill a 1973 volume, ro<Jtcly tools 
for the achievement of accountability. j 

Thcincxtj step is to realize that the dciflnitions reflect the 
•differences in people's id^as about what \effective education 
is\ as long as educators continue to argu^ this issue (and let 
us hope they always xWll), they will continue to disagree 
the definition M educational accountability. A's Lcs- 
himsclf concluded in a published! interview in April 
"Accountability is not defined yct. M It will be up 



about 
singer 
1075 



to catchers, adnjinistrators, and other educators to formulate 
the definition As we learn more and more about our educa- 
tional responsibilities to children and how to achieve them. I 
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STANDARDIZED TESTS: 
WHAT DO THEY MEASURE? 



From accountability's earliest days, Lcssingcr and other 
proponents have been calling for the improvement of testing 
methods. Jn a 1970 volume he statccl, "in plafcc of rc[ativcly 
primitive tests now widely used, we must develop measures 
that" arc increasingly relevant and reliable. " That was five 
'years ago. The question is, Do we no.w have effective means 
of testing for accountability? 

In the early days of accountability, during the heyday of 
* performance contracting, contractors were paid almost ex- 
clusively according to students' gain scores on widely used 
standardized tests like the Metropolitan Achievement Test or 
the Iowa Test of Basic Skills. Standardized tests arc used 
when the definition of accountability includes specifying and 
achieving educational goals concerning student performance 
in cognitive subject areas. VVbcn these tests arc used, the goals 
have been stated in terms of comparisons; that is, students' 
performance. is considered adequate if it compares fayorably 
to the performance ofmost students in the United States. 
m Like the term accoyntability^the term standardized testing 
has many meanings. A report from-thc Association of Califor- 
nia School* Administrators explains, *tPor some, it merely 
means tcsts-with noriJP For others, if means the test is (1) 
published, (2) normcd7,(3) has explicit instructions for ad- 
ministration, and (4) was constructed to meet technical stand- 
dards. Still others leave out requirement 1 or requirement 2 
or both." In the pages that follow, a standardized 1 test iVa 
test that fulfills all four of these requirements. 

Standardized tests arc fifto called norm-rcfcrcnccd or 
psychometric tests. LeSagccxplams that a test becomes norm- 
rcfcrcnccd by giving it to a representative national sample of 
several thousand .students. After these sl^rcs have been 
spread oVer a bell curve, the score of any student takingW* 
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test can be expressed by iiow the raw score compares with 
those of the normative gtfoup. As Kbcl.puts it, " The aim of a 
norm-referenced test . . /is to indicate how, the attainments 
of ii particular pupil eon/ipare with those of his pccYs." This is 
done by percentile rantys or grade equivalents.* 

The content of star/daiWized tests/ according to LcSagc, is 
determined by lookinrat {popularly used tcjjtbowlcs and exist- 
ing programs. As Scsnfllcr and Murdoch poj^nt out, standard- 
ized tests arc designed] to be a good measure of .what is gen- 
crally taught." / i * 1 » 

/ • 

/ 

Advantages of Standardized Tests 

Probably one/ of the most important reasons standardized 
-tests arc used /in accountability assessment is that of availa- 
bility. Slaoda/dizcd tests have been widely used for years in 
schools, and /it is an easy thing to apply their scores for ac- 
count ability/ purposes. Another reason is' t'heir low cost; 
standardtW tests require less time and money than it would 
cost to formulaic and score' u new test or new method of 
assessment. Schiller and Murdoch note too that standardized 
test scorc/k su ch as grade equivalents are "easily understood 
by the pyiblicLinid by school personnel." 

Another purported benefit of standardized tests is their 
quality/ Althoafch the validity and reliability of these tests- for 
the pu/poscs of accountability have come binder a great deal 
of attack, proponents maintain that the* tests are of higher 
validi/y and reliability than- a ^homemade" test that has not 
been ; perfected over the years by use on large number* of 
students. This is probably the strongest reason that led Klit* 
gaa/d, like niany defenders of standardized tests, to conclude 
tha/t in spite of the imperfections of standardized tests, "it 
is fiot clear what can take their pla/cc." 

Avcrch and 4iis colleagues, note thatNtandardi/ccl|r<ts are 
r^tful useful when the function t/bVy are to perform is that of 
Comparing groups rather than individuals, fjincc they assess 
>Vhat is gcncralfy-taught^ Kbef suggests tjury can help show 
'if local programs arc tca< : fnn#? what most people consider 
*> 
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irrjpbrtant. Weber points out too that comparing scores in dif- 
ferent curriculum areas over a state can tell a state if it has 
problems in one particular curriculum area. He further notes 
t&at standardized, tfest scores can point out to a teacher or 
school system the existence of problems in one particular 
subject area. Also, the existence of National (norms facilitates 
comparisons' of Schools and programs on a national level. 

i ■ 

Disadvantages of Standardized Tpsts 

The use of most nationally-normed standardized tests to assess 
a given teacher's performance would be analogous to usmg a 
bathroom scale to determine how many stamps to put on a 
letter. 

Alkin and Klein 



! In the last several yiars, the use of standardized tests for 
accountability programs has been severely attacked. Why is 
this sb? How can people reject tests that have been so care- 
fully deyeloped and normed? * 

One^ answer is that these tests have been developed pri- 
marily to. compare students, not to assess their achievements. 
Those whose definition of accountability includes students' 
achievement of certain skills cannot measure their success 
with standardized tests. The tests tell nothing kbout what 
specific skills students have mastered; they merely tell how 
students compare to each other.. 

Stilly it would seem that those who are interested in meas- 
uring success by how students compare to others across the 
nation might find standardized tests useful. However, there 
are other problems with the tests*,. 

One problem is that standardized achievement tests may 
actually assess native abilities like jH&soning ability rather 
thaii achievement. Some critics maintain that a test designed 
to separate good students from poor stydents must necessarily 
emphasize aptitude more than achievement. Others have 
pointed out that scores in almost all subject areas depen4 
heavily op reading ability. It seems possible that we may be 
assessing ^ school's or a teacher's effectiveness by using tests 



that assess things-that schools and teachers are not able to 
teach. 

Both Klein arifd Stake point out thaj standardized tests, by 
attempting to ttst "what is generally taught," may not be 
able to test what is being taught in* a particular school or 
classroom. Porter and McDaniels have stated that "standard- 
ized tests are designed in such a way tfiat they will not be 
sensitive to many unique instructional interventions." Teach- 
ers and administrators who are considering the adoption of 
certain standardized tests must look carefully at the amount 
of overlap between the test's and the school's learning 
objectives^ 

Other critics maintain that standardized tests aren't even 
good measures of "what is generally taught." The consensus 
of the articles in the July/August 1975 National Elementary 
Principal (NEP) is that current standardized achievement 
tests are very poor measures of student performance in all 
subject areas. Taylor, in that issue, indicts elementary science 
tests for being "incorrect, « misleading, skewed in emphasis 
and irrelevant." To cite just one example from an issue filled 
with similar examples, one test asks if a damp towel placed in 
a warm dry room for one hour wijll then weigh more, less, or 
about the same as before. Taylor asks, "Does 'the towel' 
include the water it holds?" The implication is that the more 
deeply a student is able to analyze such questions, the more 
complex and difficult to answer they become. 

Perhaps more importantly, Taylor, Schwartz, and other 
writers take issue with the values underlying standardized 
tests, for, instance the assumption that memorization of the 
names of concepts 4Sr the best indication of mastery of a 
subject. . 

The critics in this issue of NEP maintain that reform must 
go beyond the development of better test items. Mouts 
quotes Hoffmann who calls multiple choice, machinc-gradable 
tests "insidious" because they "penalize the deep student, 
dampen creativity, foster intellectual dishonesty", arid under- 
line the very foundations of education." While Hoffmann 
believes that these tests may successfully be used for limited 
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types of testing (such as a driver's exam), he maintains that 
they cannot successfully measure the most important prod- 
ucts of education like creativity or profundity., 

Thomas and McKinney have noted that because most 
standardized tests have been developed to correlate with fu- 
ture performance, they are not always correlated with pres- 
ent performance. This may seem confusing to those who 
thought standardized tests were meant to. test current 
achievement. The contention is based on the fact that the va- 
lidity of standardized tests is often determined by how well 
they "track" students;* that is, how well they indicate which 
><students will perform well in the future. 

Another problem involved with using standardized tests in 
account ability is the inexactitude of their scoring. Krystal 
and Henrie note that for any one test score there is a 25 per- 
cent probability that the score is too high or too low. Lazarus 
points put that given the reliability range claimed by mos/ 
tests, even the most reliable tests give only a very rough idpa 
of student performance. As an example, he demonstrates tfrat 
a score of 550 on a widely used test with a .90 reliability * 
efficient tells us, at best, only that the student probajxt^lalls 
somewhere between fiftieth and eighty-fourtlj^fercentile. 

Each scorfc on a standardized' test can be^rejDortecHn thi$e 
ways— as a raw score, as a percentile rank^or as a |prade con- 
version. A raw score on a standardized test is merely a mean- j 
ingless number to flriost peopl^rGrade level scores are easy to / 
understand, yet it would>e€m that they are to0 inexact to^Be 
useful. Cronbach, a longtime authority on all types of testing, 
states unequivocally: "grade conversions /Should never be 
used in reporang on a pupil, or a class or/in research." One 
reason^fdr this is that on some tests a student need answer 
tly only two, three, or four more questions on the 
posttest than on the pretest to gain a full grade level. The 
same criticism can be applied to percentile ranks. 

Many accountability programs vittake educators accounta- 
ble for gains studentstmake rather than*for absolute levels of 
performance. Many authors have criticized the inexactitude 
of gain scores, which are usually obtained by subtracting the 
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pretest score from the posttest score. Stake has estimated 
that on one widely used test "on the^-average, 3 student's 
grade equivalent gain score ^ill be in error 1.01 years." Weber 
has concluded that most standardized tests are sensitive 
enough only to measure >gaify scores over a period of at least 
three years. Implementation problems are obvious. 

It is true, as Weber obseftfes, that on standardized tests, 
group gain scores axe more vilid than individual scores. How- 
ever, Olson points out that if the class mean oil pretests and 
posttests is usexl to measure performance, high performance 
by a fe^v can outweigh the poor performance of ljiany. Thus, 
by directing efforts at a small number of high achievers, a 
teacher or program can produce impressive-looking gain 
scores. 

Another criticism of standardized testing is the allegation 
that standardized tests are not valid for s^idents who have ■ fl.i 
severe learning deficiencies. Rosenshine and McGaw empha- 
size that achievement tests are designed for particular grade !, ^ . 
levels, and thus scores-especially gain scores— are not valid "| 
for students who begin above below grade level. M 

The case djgapst using standardized testing in accounta- fj 
bility programs is massive-.no matteV which definition of 
accountability one chooW Yet tod^v these tests are still ;jj 
widely used methods of assessment for accountability. • <, 

By spring 1974, 30 States had enaVted some form of.^T 
accountability legislation. Of the 30 statVs that are now re^f 
quircdj)y kw^to implement, accountability programs, 18 have-* 
enacted stifle testing programs. Still others Iiave enacted pro- ?. y 
grams*, utilizing testing. Standardized testing\is specified by 
law in at least nine of these programs. 

There is very little information available about the details , 
of state accountability programs utilizing standardized test- 
ing. Indeed, it is difficult to ascertain whether these programs 
are being implemented at all. It seems likely that although 
everybody is talking about accountability, very few people 
are doing anything about it, or at any rate, many who are do- 
ing something about it aren't talking. 
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CRITERION-REFERENCED TESTS: 
* AN EXPENSIVE ALTERNATIVE 



More and more educators, including Popham, Lessinger, 
and administrators of the National Assessment of Educational 
Progress, are turning to criterion-referenced tests for assess- 
ment purposes. " * 
Ebcl explains criterion-referenced tests this way: **Thc aim 
^ of [criterion-referenced tests] is to determine how many and 
which ones of a specified set of instructional objectives have 
been attained." A criterion-referenced test in mathematics, 
for instance, is divided into sections testing particular compo- 
nents of math such as adding two-digit numbers or multiply- 
ing fractions. Scores, given for each section, show whether 
. the student has mastered each particular component, These 
tests arc also called domain-referenced tests, mastery tests, or 
objectives-based tests. 
v - Popham and Husek offer this definition: "Crjterion- 
referenced measures arc those which are used to ascertain an 
individual's status with respect to some criterion, i.py, per- 
45 ^formancc standard. It is because the individual is corrjpared 
with some established criterion, rather than other individuals, 
thjxt these measures arc described aSvCritcrion-rcfcrtfrtced." 
The criterion used is often that of completing 80 percent of 
the items on a given section correctly. 

Thus far the meaning seems fairly straightforward, How- 
* ever, there is a great deal of disagreement over the essential 
difference between standardized or norm-referenced * and 
criterion-referenced tests. Shami and his colleagues point put 
that criterion-referenced tests can also be standardized (ad, 
ministered according to standard explicit instructions) ^nd 
normcd (their scores can be compared to a normative group). 

Rather than delve into the intricacies of a definitional 
problem that may be mostly semantic, we will make ihc 
same distinction Averch and his colleagues have made; a 
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norm-referenced test compares a student's accomplishment 
with that of others; a criterion-referenced test indicates 
whether a student has accomplished certain skills. 



, v Advantages of Criterion-Referenced Tests 

fJ l It seems clean that using criterion-referenced tests for ac- 
countability avoids many of the problems encountered when 
iing norm-referenced tests. They are endorsed most strongly 
| those whose definition of accountability includes the pre- 
iiie stipulation and measurement of educational goals, 
ijlln a 1969 article Popham states, "High quality instruc- 
tional^ planning requires the explication of instructional 
iijlfients in terms of measurable learner behaviors." Criterion- 
mferenced tests are better than standardized tests at meas- 
uring such behavior. 
\ \dvocates of increased community involvement in educa- 
tion often prefer criterion-referenced tests because,' unlike 
standardized tests, they can be locally developed to reflect 
local educational goajls or objectives. If teachers, schools, or 
state boards of education develop their own tests, they will 
also be assured that the tests measure their unique textbooks 
or programs. "The criterion-referenced test," Knipe and Krah- 
mejr note, "is the only type of test that a school district can 
use to determine if it is working toward its curriculum goals." 

Because criterion-referenced tests measure specific skills 
or knovyledge, tl)cy are designed to do more than just meas- 
urer correlates of learning or compare students. Criterion- 
referenced tests also have much more ^xact methods of 
scoring than the percentile or grade conversion techniques 
used on norm-referenced tests 4 Since students' scores merely 
indicate what learning objectives have been mastered, it is 
easy to calculate progress over time. 

Another advantage of criterion-referenced tests is that they 
can be used for individualized instruction of students at 
many different levels. A teacher can decide which sections or 
Hems of a test he or she wants a student to complete accord- 
ing to the student's own level. 

I 
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Disadvantages <?f Criterion-Beferenced Tests 



Although there arc several ways that qritcrion-rcfcrcnccd 
tests fulfill the needs of accountability programs better than 
standardized tests, they prcscnt^rohlcms of their own. One 
is cost — both in time and money. Since criterion-referenced 
tests arc useless if they do not reflect the particular educa- 
tional goals bcing^et, many educators arc finding it necessary 
to develop their own tests. Some seem to be having success, 
but for most the task seems gargantuan. 

Morrissctt puts it: "The production of valid well-structured 
hierarchies of objectives arid test items is not a task that can 
be undertaken by a teacher meeting five classes a day, nor by 
a Thursday afternoon curriculum committee." He points out 
i that the National' Assessment of Educational Progress spends 

$5 million per year developing items for use in just two or 
i three subject areas. Ebel maintains that there arc few people 

who have backgrounds that qualify .them to develop valid, 
^ reliable criterion-referenced tests. \ \ 
v.1 A possible solution to this problem is for schools to choose! 
^ instructional objectives and tests that have been developed, by < 
^ private firms or state departments of education. In this case, 
r a school may choose what it feels is a good test or selection 
of test items and then design curricula to fit the items. Pop- 
ham in 1973 recommended the Los Artgclcs-bascd, nonprofit 
Instructional Objectives Exchange for such items. Although 
several other firms and state departments of education are 
moving in this direction, it is not clear if good "item banks" 
exist y^t. In a'papcr published in 1974 on reading tests, I Io- 
gan stated unequivocally: "well-developed criterion-referenced 
tests arc simply, not available today." 

Another problem arises from developing tests locally or 
even on a statewide basis. Tly>sc developing tests may shy 
away from setting high- goats* that seem "unrealistic" com- 
pared to past achievement. A rclattd problem is raised by 
Krystal and Hcnric: What happens if local special interest 
groups gain too much^rontrol over formulating learning 
objectives? Gubser ' gives an account of a new teacher 
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rccertification program in Arizona that depends heavily on 
Students' -answers on test* that piany fqcj arc* based on par-* 
tlcular political beliefs and ideologies. Af question that must 
be doalt with when developing crifcrion-rcfcrcndfcd tests is, 
Arc local goals always better than more widely held goals? 

An additional problem sometimes found with criterion- 
referenced tests is that they usually cannot be used to com- 
pare students. Unless the tests arc normcd,.thc scores of these 
tests, like thc1*aw. score on a standardized test, do not tclf* 
anythin^about where a student stands nationally. For this 
reason, soW suggest the development of critcrion-rcfcrenccfd 
tests or, at fea§t, crttcrion-rcfcrcnccd items that .have also 
been nationally normcd. Grady recommends merely using 
b<*th types of tests. 

Both Ebcl and Haggart have noted another problem with 
criterion-referenced tests. As Ebcl states, "Emphasis on dis- 
crete spceifics-may lead to neglect of the integration of ideas 
'that gives unity and solidarity to a subjpet." Perhaps criterion- 
referenced tests will encourage students to collect differenti- 
ated skills or small bits of knowledge at the sacrifice of 
understanding underlying concepts or ideas. 

In fact, many theorists, including Averch and his colleagues, 
have voiced the fear that the most important ^oals of educa- 
tion may be too broad and complex to test With criterion- 
referenced tests. Combs maintains that one of our most 
important educational goals js to Greatcjntcliligent citizens 
who arc "creative, flexible, open to experience, responsible 
to themselves and others and guided by positive goals and 
purposes. " He nates, hpy/cver, that these types of goals arc 
"at odds with the specificity and precision demanded by most 
persons operating in the behavioral-objective performance- 
based criteria persuasion." 

Copibs further criticizes the behavioral objectives approach 
on Which criterion-referenced tests arc based fo^ being a 
"closed system of thinking" because it allows only far planndrf 
outcomes.* It would be tragic indecd^jf schools restricted 
themselves to teaching only those things that can be measured 
by a criterion-referenced test. 
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State Testing Programs 



At least 13 states nbw use critcrion-pfcfcrcnccd tests in 
thciristatcwide assessment programs, and/there arc indications 
that more may soon follow suit. Accountability programs 
using tfris type of testing arc somewhat better reported than 
programs using standardized tests. Three of the most widely 
publicized programs arc in Florida, California, and Michigan. 

The Florida program, utilizjmg both criterion-referenced 
and norm-rcfcrcnccd tests, i^/bascd on Florida's 1971 Educa- 
tional Accountability Actg/Thc criterion-referenced compo- 
nent of the testing has thus far been devised by Florida 
reading specialists and teachers who chose performance 
objectives from a catalog provided by the Center for the 
Study of Evaluation at the University of California at Los 
Angeles. The program, hSrojc^tcd through 1978, includes 
plans to measure student pcrfprniancje in such divcrste areas as 
mental health and aesthetic appreciation as well as communi- 
cation and learning skills! / / 

The California program is/Eased on the 1972 Stull Act, 
which requires each teacher/ to develop pupil performance ' 
objectives and critcrionircfcrtrjccd tests as a basis for evalua- 
tion of his or her worW. In IS 7 2-7 3 the San Diego Unified 
School District responded .t<> the act with apian prepared by 
teachers and principals/for teacher evaluation based on stu- 
dent performance on certain learning objectives. Although a 
few other* similar kincys of programs have been" instituted in 
California, it is unclcif what kjnds^f programs most schools 
in the state arc instating, or indeed if they are instituting 
serious programs. A paper from the Institute for the Develop- 
ment of Educational Activities notes, regarding California, 

| that "teachers arja administrators consider that state's 

t accountability program 'a paper tiger'.'* 

The Michigan pjfbgram, begun in 1970, is one of the pioneer 

; state accountability programs. It originally utilized norm- 
rcfcrcnccd tests , but after two years replaced them with 
criterion-referenced tests developed by the state board of 
education, teachers, and administrators, At present the 
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program measures performance only in reading and math, but 
plans arc being made for testing in other areas. In the future, 
the state plans to avoid spending the millions of dollars 
necessary to test all studcntSjJJy testing only a representative 
sample of students on most objectives. A J 974 National Edu- 
cation Association-sponsorcd evaluation of this program 
severely, criticized it for using performance objectives that 
purportedly were not field-tested or validated and that 
penalized minority students. The NEA committee recom- 
menced the use of local rather than statewide objectives. 
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WHY TEST AT ALL? 



Although some form of testing student performance is \ 
a component of almost all accountability programs, many \ 
writers suggest that all forms of testing— whether standard- V 
iz<?d or criterion-referenced— present more problems than 
they solve. 

Poor Measures of Good Education « 

Onc'argumcnt against using test scores as major criteria in 
accountability ^programs is raised by Soar/ He maintains that 
it makes little sense to make teachers responsible for students' 
test scores when there is no research trf indicate that there is 
any correlation between teaching and (est sjcorcs. r 

Another often-cited argument is that t£sts now currently 
available arc culturally biased against minqrity students who 
frequently have a different language, different experiences, 
and different* ways of looking at the worlq than do the ma- 
jority of students. Similarly, others contend that students' 
scores indicate less about the effectiveness of teaming and 
programs than about the socipcconomic backgrounds of the 
students. ; 

At bottom, this is not only an argument against the va- 
lidity pf testing techniques but also an argument against mak- 
ing teachers ^accountable for the academic performance of 
deprived students. The truth is that we know very little about 
how to raise the achievement 'rates of these students. How 
can wcl hold teachers accountable for doing what no one yet 
knows bow tc/do? 

Another pfoblcrn, involved with using any type of gain 
scores as the, main method of measuring whether educational 
goals have been met is the regression effect. The regression 
effect means that no matter what kind of test one uses, stu- 
dents who score high on a pretest will tend to score lower 
on the posttest, and students who score low on a pretest will 

16 

23 



tend to score; higher on the posttcst. The existence of this 
effect is Hot an argument against testing per sc. It does mean 
that gain scores may be invalid unlcss^thcy can be compared 
to those of a control group, and such comparison is often 
difficult and costly. 

Adverse Effects of Testing 

Critics warn that we must be very careful that achieve- 
ment testing programs don't put so much pressure on students 
that there is a .sacrifice of academic honesty. If educational 
excellence is measured only by students 1 scores on tests, both 
students and teachers may be tempted to cheat. Such an out- 
tome may. be especially likely if teachers and students arc 
asked to produce more than they really can. Teachers may 
coach, encourage, or hurry students during a test or even go 
so far as to improperly score tests if Under pressure. 

Another way thai teachers may react to extreme pressure 
is by "teaching to the test." This means having students 
memorize the correct responses to the specific items on the 
test. Many have been critical of criterion-referenced tests for 
being easy to ''teach to," partly because their items test mas- 
tery of specific performance objectives rather than broad 
general concepts and partly because teachers themselves 
often have a hand in making up test items. Of course, it is 
also possible to teach to a norm-referenced test if the teacher 
is able to obtain copies of the test, before it is given. 

Glass and VVildavsky suggest that an answer tifcsomc of 
these problems is for an outside independent auditing agency 
to administer tests and see that they arc fairly conducted. 
This policing of tests, however, docs not lessen the extreme 
pressures that make teachers and students desperate enough 
to attempt chcafing in the first place. 

7 

Noncognitive Subject Areas 

The outcomes-oriented educator cleaves exclusively to objec- 1 
tives amenable to measurement. - ~ J 

Popham 1969 
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Measuring what wc know how lo measure is no substitute for 
measuring what we need to measure. 

^ Combs 

Many advocates of current testing procedures -are never- 
theless quick to admit that there are important educational 
goals that, as yet, cannot be measured by any tests. Ebcl 
mcn(l6ns that we are not currently able to test "interests, 
values, aspirations, attitudes or self concepts. " {.cssinger, 
in April 1975, noted that our tests cannot assess things like 
"insightful appreciation, understandings and flashes of in- 
sight.''' Soar notes that student characteristics Like complex 
problem-solving ability or responsible citizenship behavior 
have growth rales that are so slow as not to be measurable. 

Some accountability programs demand the specification of 
educational outcomes that can be measured very precisely; 
yet, some critics maintain tha4 by concentrating on measura- 
ble outcomes wc may be slighting the outcomes that arc most 
important. Combos holds that our cdupational efforts ought to 
be toward creating people wht> exhibit "intelligent behavior^ 
intelligent problem-solving, and good judgment." He also 
holds that it is important for students to discover the "pcr- 
• sonal meaning" of the knowledge they arc learning. These 
things ate not measured by either criterion-referenced or 
standardized tests. What this point of view suggests is that 
basing accountability programs solely on outcomes that arc ^ 
"testable" may cause educators to lose sight of the most 
important educational goaliJ. 

But if wc eliminate tests, how can wc determine if cduca- 
tional goals are achieved? Must wc then discard the whole 
concept of accountability as impossible to implement? 

Alternatives to Testing 

Some educators have pointed out that traditional forms of 
testing arc not the only way to evaluate school or teacher 
effectiveness. In spite of his affinity for criterion-referenced 
tests, Lcssingcr, in a 1970 issue of Educational Technology, s 
recommends that accountability can make use of "a variety 
of modes of attaining evidence. One thinks immediately of 
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hearings i>f juries 6r expert witnesses, of certified autHtors/of 
petitions or the like. Education can make use of^alkthcsc 
modes and can use such means of acquiring evidence as video- 
tape and 4nipil performance in simulated real-life situations, 
to mention a few." 

Pcrronc reports that since 1972 the North Dakota Study 
Group on Evaluati6n has been examining current methods of 
assessment. Members of this group arc moving toward the use 
of observation and daily record-keeping by teachers and stu- 
dents as go/)d ways of measuring the important goals that 
tests cannot measure. A gftpup atvEdycational Testing Service 
in Princeton is studying sin$Jar measures. 

In i^LHawcs quoted a Superintendent using such forms 
of evaluation, "We don't feci there's a testing program out 
today ttiit measures ^h at we believe is important to evaluate. 
As examples, we want to appraise students' attitudes toward 
learning* and' Using what they've learned. And wc^want to 
assess the more creative aspect? of the student's ability." 

ttofftftann, re^n interview with Houts, calls for a return to 
the' mofc individualized subjective forms of evaluation used 
before machine-glided tests. He suggests the development of % 
testing in which concern is not just with the correct «answpr 
but with "tlfe reasoning proses used to arrive at- the answer." 

Some authors have suggested making educators accountable 
, for the process that occurs in the classroom rather than the 
product. In a 1974 article Aldrich recommends that schools 
be held "responsible for the environments which they create 
and foster for children." Instead of testing students, the 
school^ might evaluate things like materials and activities 
available, arrangement of time and space, and teaching'skills. 

Others have suggested making teachers accountable only 
for "behaving professionally." Stocker notes the National 
Education Association recommendation that teachers be 
evaluated on responsibilities like "adequate academic prepa- 
ration" and "knowledge of and concern for students/' It is 
unclear what methods would be used for such evaluation, but 
they would not be concerned with studpnt performance. 

The problem of developing alternatives to standardized 



\26 



19 



testhig is being tackled by representatives to a conference on 
standardized testing convened by the National Association of 
Elementary School Principals and {he North Dakota Study 
Group on Evaluation in November 1975. Twenty-five leading 
national education associations, government agencies, and 
education groups called for investigation into the uses and 
impart of standardized tests in the schpols and for the devel- 
opment of more fair and effective means of assessment. The 
g^ojup plans to meet in the spring of 1976 to discuss findings 
and further recommendations. 

• Most alternative forms of evaluation have had so little 
application that it is hard to weigh their strengths and weak- 
nesseL At the moment they seem to hold a great deal of 
promise, but it remains to be seep how much time and 
money they will co st^ 

Th^re are, however, a few schools tlhat are using alternative 
forms of evaluation. These programs seem to stress the defini- 
tion qf accountability that includes reporting to the public on 
broad educational goals. The programs are not restricted to 
narrowly prescribed educational outcomes but instead concen- 
trate on accurately reporting all kinds of student achievement. 

In a 1974 article Hawes tells the, story of Devil's Lake, 
North Dakota, a 2,000-student systeiin that evaluates student 
performance through daily teacher observation and note- 
taking, samples of students' work, and teacher inventories of 
children's skills and attitudes. This evaluation is reported to 
parents by means of personal interviews with the teacher. 

According to Aldrich's 1975 report, the Marcy School in 
Minneapolis utilizes an "internal evaluator" who evaluates 
the totaHearning environment by observation of children in 
the classroom. This technique is useful for those whose defi- 
nition of accountability includes making educators accounta- 
ble for "process," that is, what goes on in the classroom. 

The Prospect School in Vermont includes student journals 
of their daily activities as part of their assessment program. 
Carini, a staff member of the school, writes that this technique 
makes it possible "to report precisely to parents and others 
on growth of individual students." 
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CONCLUSION 



An administrator faced with the decision of what method 
of evaluation to use for accountability will find that there are 
no "easy answers. Most authorities on testing seem to agree 
that traditional standardized testing is not adequate. Yet 
there is still a great deal of disagreement about which other 
methods can do the job the best. It seems clear that, for the 
time being at least, all the best methods of assessment and 
evaluation are going, to involve a great deal of time and 
money. 

Administrators whose definition of accountability includes 
the stipulation and achievement of precise learning objectives 
will no doubt choose to assess student performance with 
criterion-referenced tests. Those concerned chiefly with 
assessing achievement of the broadest educational goals or 
with reporting the educational processes that occur in the 
classroom will experiment with the alternative forms of 
evaluation now being developed. 

The method of evaluation chosen depends on one's defini- 
tion of accountability, and this, as we have said, depends on 
how one answers the question, What is good education? Each 
educator must ultimately find his or her^own answer to this 
question. 
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